Skip to content
View Knowledge GraphView Article Network
Article Series:iPAS Exam Preparation Notes (3 / 3)

iPAS Exam Preparation Notes - Operational Intelligence Analyst (Entry Level)

After completing two iPAS certifications, I noticed there is another exam for the Operational Intelligence Analyst certification on 2026/10/31. I decided to give it a try, although after reading through the material, I found it to be quite different from what I had imagined. As per my usual practice, after speed-reading the two official handouts, I had AI organize them into notes that I find easier to study. However, this subject lacks past exam papers, making it impossible to estimate potential scores. Furthermore, the handouts only cover basic content, and many exam questions fall within the syllabus but are not explicitly in the handouts. Honestly, this makes me feel a bit uneasy about taking the exam.

Basic Knowledge of Operational Intelligence

Operational Intelligence vs. Business Intelligence

Business Intelligence (BI) is the use of information technology by enterprises to organize actionable information from daily operational data, helping companies improve operational performance and competitiveness.

Operational Intelligence (OI) applies business intelligence to the dynamic systems within an enterprise's operational model. It goes beyond historical reporting, emphasizing fast, real-time, and integrated information flows, organizing key information and critical indicators from processes such as R&D, procurement, production, marketing, finance, accounting, and human resources into a basis for decision-making.

ConceptFocusTypical Use
Business Intelligence (BI)Collecting, integrating, analyzing, and presenting enterprise data.Reports, dashboards, trend analysis, KPI tracking
Operational IntelligenceIntegrating BI analysis into enterprise operational processes to support real-time and multi-dimensional decision-making.Production planning, sales analysis, financial analysis, HR performance
Big Data AnalyticsProcessing structured, semi-structured, and unstructured data; data-driven.Social media analysis, customer behavior prediction, smart manufacturing

Operational Intelligence System Architecture

Operational Intelligence systems typically consist of three layers: the Data Layer, the Integration Layer, and the Analytics Layer. Data is first generated by various operational systems, then enters a data warehouse or data platform for integration, and is finally used for decision-making through analytics engines and data presentation tools.

Diagram
LayerDescriptionCommon Technologies
Data LayerData sources and operational systems.ERP, CRM, SCM, POS, sensors, external data
Integration LayerCleaning, transforming, and integrating data from different sources.Data warehouse, ETL, Metadata Repository
Analytics LayerAnalyzing data and creating reports.OLAP, data mining, statistical analysis, BI tools

KPIs and the Balanced Scorecard

A Key Performance Indicator (KPI) is a metric used to measure whether an organization is moving toward its goals. One of the tasks of operational intelligence is to identify KPIs for departments, processes, and the overall organization, allowing managers to review, measure, and correct action plans.

The Balanced Scorecard (BSC) is a strategic management method that translates vision and strategy into specific indicators across four perspectives.

PerspectiveFocus QuestionExample KPI
FinancialDoes it contribute to shareholders and financial results?Revenue growth rate, gross margin, ROI
CustomerHow do customers view the enterprise?Customer satisfaction, renewal rate, complaint rate
Internal ProcessWhich processes must perform well?Delivery fulfillment rate, yield rate, process cycle time
Learning and GrowthDoes the organization have the ability to improve continuously?Employee training hours, skill coverage, number of innovation proposals

There is a causal relationship between the four perspectives of the Balanced Scorecard. Learning and growth serve as the foundation; once internal processes improve, they drive customer satisfaction, which is ultimately reflected in financial results. This causal structure is called a Strategy Map.

Diagram

Applications of Operational Intelligence in Business Management

Operational Intelligence can be applied to enterprise functions such as production, sales, finance and accounting, and human resources. The key is to transform data from various processes into information that supports management actions.

Enterprise FunctionOperational Intelligence UseCorresponding System
Production Planning and ControlTracking production quantities, scheduling, capacity, and actual vs. planned variances.ERP, MRP, MES
Sales and Distribution AnalysisAnalyzing customer, order, channel, promotion, and delivery data.ERP, CRM, SCM, PLM
Finance and AccountingIntegrating operational processes and financial data to support management and financial accounting.ERP Finance/Accounting Module
Human ResourcesQuantifying HR performance and tracking human resource utilization status.HRIS, BI dashboards

Taking sales and distribution as an example, this process often involves multiple systems simultaneously, illustrating how they divide labor.

SystemRole in Sales and Distribution
ERPStores sales activities, orders, delivery, shipping, invoicing, and master data.
CRMManages customer attributes, marketing activities, promotions, and channel interaction data.
SCMSupports sales forecasting, distribution networks, inventory, transportation, and order allocation.
PLMManages product status, features, and lifecycle information, supporting marketing and pricing.

For the complete definition of each system mentioned in this section, see Common Information Systems for Digital Enterprises later in this document.

Evaluating and Planning Operational Intelligence

Before introducing operational intelligence, one must evaluate whether it can solve current operational problems or support future business directions. Only after evaluation does the project move into planning, confirming whether the enterprise has sufficient technical and non-technical infrastructure.

StageFocusDescription
Operational AssessmentIdentify operational problems to be solved.E.g., reducing inventory, improving delivery times, increasing customer satisfaction.
Cost-Benefit AssessmentCompare benefits with implementation costs.Benefits can include increased revenue, reduced costs, and increased market share.
Risk AssessmentEvaluate project risks.Technical risks, project complexity, organizational risks, team risks.
Infrastructure AssessmentCheck implementation conditions.Hardware, middleware, databases, processes, data, application systems, metadata.
Project PlanningEstablish plans and schedules.WBS, task dependencies, resources, critical path, duration.

Enterprise value can be measured in four directions.

QuestionMeasurement DirectionExample
Better?Has quality or satisfaction improved?Higher yield, higher customer satisfaction
Cheaper?Have costs been reduced?Decreased development, communication, and inventory costs
Faster?Has speed increased?Production efficiency, time-to-market, service response speed
Do more?Has capability or scope expanded?New customer sources, market share, number of system users

Gantt Charts, PERT, and CPM

Operational intelligence projects can be planned, scheduled, and controlled using Gantt charts, PERT, and CPM.

ToolSuitable UseFocus
Gantt ChartSimple project scheduling and progress monitoring.Shows activity time and progress, but is not good at showing activity dependencies.
PERTEstimation when task duration is uncertain.Uses three time estimates: optimistic, most likely, and pessimistic.
CPMFinding the shortest project completion time and critical path.Activities on the critical path directly affect the project completion date.

Common steps for PERT and CPM:

  1. Define the project and create a Work Breakdown Structure (WBS).
  2. Identify all activities and their sequence.
  3. Draw an activity network diagram.
  4. Assign time and cost estimates to each activity.
  5. Calculate the longest time path in the entire network.
  6. Use the network diagram to assist in planning, scheduling, and controlling the project.

PERT Expected Time Formula

Given optimistic time O, most likely time M, and pessimistic time P, the PERT expected time is:

TE=O+4M+P6

The formula is essentially a weighted average, with the numerator being O×1+M×4+P×1. 4 is the weight of the most likely time (making it dominate the estimate), and 6 is the sum of the weights (1+4+1). Both are fixed constants, independent of the number of activities or the shape of the network diagram.

The variance is:

σ2=(PO6)2

Taking the activity "AI Customer Service Model Training" as an example, the engineer estimates 2 days optimistic (everything goes smoothly), 5 days most likely (normal progress), and 14 days pessimistic (serious errors):

TE=2+4×5+146=366=6 daysσ2=(1426)2=22=4,σ=2 days

Although 5 days is the most frequent occurrence, the pessimistic time is as long as 14 days, so the weighted expected time is pulled to 6 days, which is the number filled into the network diagram. The standard deviation of 2 days is an input for subsequent risk analysis. However, to answer probability questions like "What is the confidence level of completing within 8 days?", one cannot simply apply a normal distribution to a single activity (a single activity duration approximates a Beta distribution); the standard approach is to first aggregate the means and variances of all activities along the critical path into the completion distribution of the entire project. After summing multiple activities, it approaches a normal distribution according to the Central Limit Theorem, and then the probability is estimated. If the project completion average is 6 days and the standard deviation is 2 days, 8 days is exactly +1 standard deviation, with a probability of about 84% (50% below the mean, plus 34% for 0 to +1σ; see Descriptive Statistics and Common Statistical Concepts for the breakdown).

PERT/CPM Network Diagram Illustration

Diagram

Each node in the diagram represents an activity (A is start, F is end), and the arrows represent the dependency relationship between activities. The pointed-to activity must wait for all preceding activities to complete before it can start. The number of days on the node is the expected time calculated for each activity using the PERT formula. Summing the days for each path:

  • A→B→D→F: 2 + 4 = 6 days
  • A→C→D→F: 3 + 4 = 7 days
  • A→C→E→F: 3 + 5 = 8 days

The longest path, A→C→E→F, is the critical path (red nodes). The project only ends when all paths are completed, so the total duration is determined by the longest path, which is 8 days. The "shortest completion time" mentioned by CPM refers precisely to the length of this longest path; it cannot be rushed any faster.

Activities on the critical path have no slack. If C or E is delayed by one day, the total duration increases by one day. Activities on non-critical paths have Float. Taking B as an example, D must wait for both B and C to complete. Since D takes 4 days, it can start as late as the 4th day and still finish synchronously with E on the 8th day. Therefore, B can be delayed by up to 2 days without affecting the project duration; exceeding this will push the project back.

Basic Data Analysis

Data Processing Chain

Data must go through a series of processes to transform from raw records into information that can be used for decision-making. The typical flow is that data first enters a database, is then organized into a data warehouse according to analysis purposes, then patterns are found through data mining, and finally, it is presented to decision-makers through visualization.

Diagram
StageFocus
DataInformation that can be recorded, which may be numbers, text, images, or sound.
DatabaseStoring daily operational data.
Data WarehouseIntegrating data from multiple sources for analysis purposes.
Data MiningFinding correlations, patterns, trends, or prediction rules.
Data VisualizationPresenting analysis results through charts, dashboards, or reports.

Data Sources and Datafication

Data is not necessarily just numbers; any information that can be recorded can become data. Enterprise data sources include transaction data, customer data, supply chain data, sensor data, customer service records, social media text, images, and video files.

Datafication is the process of converting phenomena that are not easily quantifiable into analyzable data. For example, converting customer service calls into text for sentiment analysis, or converting equipment vibration signals into time series to predict failures.

Structured, Semi-structured, and Unstructured Data

Data TypeCharacteristicsExampleProsLimitations
Structured DataFixed fields, format, and order.Sales data tables, member data tablesEasy to query, integrate, and analyze.Low flexibility; data not conforming to field rules is hard to store.
Semi-structured DataHas field concepts, but fields can be added/removed.CSV, JSON, XMLEasy to exchange and extend.Fields are not necessarily consistent; higher data governance requirements.
Unstructured DataNo fixed fields or fixed format.Images, videos, Email, web pages, customer service recordingsCan preserve rich content.Usually requires conversion before analysis; information may be lost during conversion.

Data Quality and Preprocessing

Data quality directly affects analysis results. Common data quality problems include missing values, duplicate data, inconsistent formats, inconsistent units of measurement, inconsistent field definitions, and outliers.

ProblemDescriptionDirection of Handling
Missing ValuesField not filled or unobtainable.Imputation, deletion, marking the reason for missing.
Duplicate DataThe same entity is recorded repeatedly.Deduplication, primary key comparison, merging records.
Inconsistent FormatChaotic date, phone, address formats.Standardizing formats.
Inconsistent UnitsMixing kilograms and grams, TWD and USD.Unifying units and retaining conversion rules.
Inconsistent DefinitionDifferent departments define the same field differently.Establishing a data dictionary and common definitions.
OutliersOutside reasonable range or violating rules.Investigating the cause, correcting, or retaining and marking.

Common meanings of data quality:

  • Accuracy: Does the data reflect the true state?
  • Reliability: Are the data source and production process credible?
  • Consistency: Are definitions consistent across different systems or fields?
  • Completeness: Are necessary fields complete?
  • Relevance: Is the data relevant to the originally set business goals?

Missing Value Handling Methods

Handling missing values is one of the most common tasks in data preprocessing, and different methods are suitable for different scenarios.

MethodDescriptionSuitable Scenario
Listwise DeletionDirectly deleting the entire record containing missing values.Very low missing ratio; deletion does not affect sample representativeness.
Pairwise DeletionExcluding the record only when the field is used; other analyses still include it.Multivariate analysis; hope to retain more data.
Mean/Median ImputationFilling with the mean or median of the field.Numerical fields; use mean for normal distribution, median for skewed distribution.
Linear InterpolationUsing known values before and after the missing point for linear interpolation.Continuous data like time series; retains local trends, avoids flattening changes caused by mean imputation.
Mode ImputationFilling with the most frequent category in the field.Categorical fields.
Regression ImputationUsing other fields to build a regression model to predict missing values.Missing field has a clear correlation with other fields.
KNN ImputationFilling with the average of the K most similar records.Clear multi-field features, sufficient samples.
Multiple ImputationGenerating multiple sets of possible values and merging analysis results.High-quality research, complex missing mechanisms.
Retain and MarkRetaining missing, adding an "is missing" flag field.Missing itself has meaning (e.g., not filling is also information).

Data Storage: From Databases to Data Lakehouses

ConceptEnglishCharacteristics
DatabaseDatabaseStores daily operational (OLTP) data, highly normalized.
Data WarehouseData WarehouseIntegrates historical data from multiple operational systems, fixed Schema, for enterprise-level analysis.
Data MartData MartA subset of a data warehouse, focusing on specific departments or themes (e.g., marketing, finance).
Data LakeData LakeStores structured, semi-structured, and unstructured data in raw format; Schema is defined only during analysis (Schema-on-Read).
Data LakehouseData LakehouseCombines the management capabilities of a data warehouse with the flexibility of a data lake, featuring both ACID transactions and multi-format storage.
Diagram

Database vs. Data Warehouse

Both store data, but serve opposite purposes:

  • Databases are "operational" oriented, storing current real-time data, frequent writes, and highly normalized to reduce redundancy (corresponds to OLTP).
  • Data warehouses are "analysis" oriented, integrating historical data from multiple sources, query-focused, and denormalized to speed up aggregation (corresponds to OLAP).

Databases answer "what is the current state," while data warehouses answer "what is the long-term trend."

ETL and ELT

The foundation of a data warehouse is ETL (Extract, Transform, Load), which is extracting data from source systems, cleaning and transforming it, and then loading it into the data warehouse.

The study guide also uses ECCD to describe the data processing flow:

StepEnglishDescription
ExtractExtractTaking data from raw data sources.
CleanCleanConfirming data quality, handling missing, erroneous, and inconsistent data.
ConformConformUnifying definitions, formats, and dimensions across data sources.
DeliveryDeliveryDelivering usable data to application systems or decision-makers.
Diagram

When planning an ETL system, both requirements and architecture must be considered simultaneously.

TypeConsideration Items
Requirements AnalysisBusiness, regulations, quality, security, integration, schedule, backup, delivery, skills, resources
Architecture DesignTool procurement or self-development, batch or streaming, schedule automation, exception handling, quality control, recovery and restart, metadata, data security

Differences between ETL and ELT

ETL and ELT consist of the same three steps, but the difference lies in the sequence of Transform and Load, which also changes where Transform is executed:

StepETLELT
ExtractExtract raw data from source systems.Extract raw data from source systems.
TransformClean and apply business rules in external tools before loading.Transform using the platform's computing power after loading.
LoadFinally, write the organized clean data into the data warehouse.First, write raw data directly into the data lake or data lakehouse.
  • ETL is suitable for traditional data warehouses. Taking monthly financial closing as an example, currencies are unified, duplicate transactions are removed, and missing values are filled in an external tool before loading into the warehouse. Data quality is high, but the entire process must be rerun if business rules change.
  • ELT is suitable for data lakehouses and cloud data platforms. Taking an operational analysis platform as an example, order, POS, and customer service records are loaded in full, and then financial summaries, sales analysis, and customer segmentation datasets are generated according to needs. Raw data is kept intact, and when new analysis needs arise, one can go back and re-transform without being limited by the initial design.

Why ELT has risen

  • Early database storage costs were high, and computing and storage were tied to the same machine; transforming and reducing volume externally before loading was a necessary practice at the time.
  • Cloud object storage costs have dropped significantly, making full-volume loading feasible; modern cloud data platforms separate computing from storage, allowing for on-demand scaling of computing power for transformation within the platform.
  • ETL's aggregation and cleaning are destructive processes; once raw details are aggregated, they disappear. The need to retain raw data makes ELT a common choice on modern platforms.

OLTP and OLAP

OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) are two different database operating modes for different purposes.

AspectOLTPOLAP
Main PurposeProcessing daily transactions (add, modify, delete)Analysis and decision support
Data VolumeSmall per transaction, moderate totalLarge amount of historical data
Operation TypeLarge number of short transactions, frequent writesSmall number of complex queries, read-focused
Data StructureHighly normalizedDenormalized (star, snowflake)
Typical SystemERP, POS, order systemsData warehouse, BI platform

Five OLAP Operations

Multi-dimensional analysis can be imagined as operating on a Data Cube. Suppose this cube has three dimensions: time (year/month/day), location (country/city), and product. Each cell stores the sales amount for that combination. The five operations are different ways of viewing this cube.

OperationEnglishDescription and Example
Roll-upRoll-upAggregating from fine to coarse along a dimension, e.g., summing daily sales of cities into monthly performance for the whole country.
Drill-downDrill-downThe reverse of roll-up, expanding from coarse to fine, e.g., clicking from national performance to see details by city.
SliceSliceFixing a single value for one dimension, taking a plane, e.g., looking only at "2026".
DiceDiceSetting limits on multiple dimensions, taking a smaller sub-cube, e.g., "2026 × Taipei × Mobile Phone".
PivotPivotNot changing data, only changing the presentation direction of dimensions, e.g., swapping report rows and columns to read from a different perspective.

The essence of the three sets of operations is different. Roll-up/Drill-down changes the granularity of data; Slice/Dice changes the scope of data; Pivot only changes the presentation angle.

Distinguishing Slice, Dice, and Drill-down

  • Slice vs. Dice: Both are filtering data; the difference is only in how many dimensions are fixed. Slice fixes one dimension and takes a plane (3D to 2D); Dice fixes multiple dimensions and takes a sub-cube.
  • Drill-down vs. Slice: Both are like narrowing the focus, but Drill-down goes down the same dimension to become finer (changing granularity), while Slice filters out other values of a dimension (changing scope), and the report will have one fewer dimension.

Star Schema and Snowflake Schema

Data warehouses often organize data using Dimensional Modeling, splitting data into two types of tables. The Fact Table is placed in the center, recording measurement numbers of "what happened," such as sales amount and purchase quantity; Dimension Tables surround it, recording the background of these numbers, such as when, where, and which product was purchased. The difference between star and snowflake schemas lies in how dimension tables are arranged.

ModelCharacteristicsTrade-off
Star SchemaCenter is the fact table, surrounded by directly connected dimension tables; dimension attributes are flattened in the same table.Simple structure, few JOINs, fast queries, but dimension tables have duplicate data.
Snowflake SchemaRe-normalizes dimension tables, splitting duplicate attributes into multi-layer sub-dimension tables.Saves storage, reduces redundancy, but more JOINs, slower queries.

Taking the product dimension as an example, the star schema flattens product name, category, and category manager into the same product dimension table. This hides a transitive dependency: product determines category, and category determines manager. The star schema tolerates this, so 1,000 electronic products will have the category and manager written 1,000 times. The snowflake schema extracts the category into an independent sub-table, eliminating this transitive dependency, effectively pushing the dimension table to the third normal form (3NF). The cost is an extra JOIN to look up the category name.

Additionally, normalization only affects dimension tables; the structure of the fact table is the same in both models. Fact tables only contain foreign keys and measurement values (e.g., product ID, customer ID, quantity, amount), no descriptive text, so there are no transitive dependencies to split.

Viewing Fact and Dimension Tables with Simulated Data

The sales fact table only has foreign keys and measurement values, no text descriptions:

Date IDProduct IDCustomer IDQuantityAmount
2026030110150121000
20260301102502130000
2026030210150352500

The star schema's product dimension table flattens descriptive attributes; category and manager will repeat:

Product IDProduct NameCategoryCategory Manager
101Bluetooth EarphonesElectronicsWang Xiaoming
102LaptopElectronicsWang Xiaoming
103Office ChairFurnitureLi Dahu

The snowflake schema extracts the category into a sub-table, the product table only keeps the category ID, and "Electronics/Wang Xiaoming" no longer repeats:

Product IDProduct NameCategory ID
101Bluetooth EarphonesC01
102LaptopC01
103Office ChairC02
Category IDCategoryCategory Manager
C01ElectronicsWang Xiaoming
C02FurnitureLi Dahu

Star vs. Snowflake: Three Trade-off Aspects

Star and snowflake schemas each have their strengths. In practice, they are weighed across three aspects:

  • Query Speed: Star has fewer JOINs, faster; snowflake has more JOINs, slower.
  • Storage Space: Star tolerates duplication, takes more space; snowflake eliminates redundancy, saves more.
  • Maintainability: Snowflake keeps duplicate attributes in one place, update one record; star has the same attribute repeated in many rows, requiring all to be changed during updates, potentially causing inconsistencies (update anomalies).

However, in data warehouses, the weight of maintainability is lower than in transactional databases because warehouses are focused on read-based analysis and rarely modify historical data; the update burden caused by star duplication does not occur often. Therefore, most scenarios still prefer the star schema for its fast queries and intuitive SQL, and cloud platforms with low storage costs have further reinforced this choice.

Data Visualization and Chart Selection

The purpose of data visualization is to convey key points to decision-makers using appropriate charts without distorting the data.

The first step in choosing a chart is to think clearly about what relationship you want to express, then pick the corresponding chart.

Analysis PurposeSuitable ChartDescription
Compare CategoriesBar chart, horizontal bar chartComparing quantities or sizes of different categories; horizontal bar charts are easier to read when there are many categories.
Show TrendsLine chartObserving changes in values over time.
Show CompositionPie chart, stacked bar chartSeeing the proportion of parts in a whole; categories should not be too many.
Observe DistributionHistogram, box plotHistograms show the distribution shape of continuous values; box plots show median, quartiles, and outliers.
Observe CorrelationScatter plotObserving the relationship between two variables.
Show Cross DensityHeat mapPresenting the magnitude of 2D cross-values with color depth, e.g., sales heat by time period × region.

Bar Chart vs. Histogram

  • Bar charts are commonly used for qualitative data, such as product categories, regions, and departments.
  • Histograms are commonly used for quantitative data, such as amounts, height, and working hours.

Common Misleading Visualizations

The same data can give completely different impressions depending on how it is drawn. The following are four common misleading techniques.

Truncating the Y-axis: If the Y-axis does not start from zero, it will magnify原本微小的差距 (originally small differences). For example, with values 85 and 90, the two bars look almost the same height when the Y-axis is 0 to 100; after changing to 85 to 100, the height difference is severely magnified, and 90 looks far higher than 85. When comparing absolute quantities, the Y-axis should start from zero; when local differences need to be emphasized, the axis range must be clearly marked.

Comparison of bar charts for the same set of values (85 and 90) with Y-axis ranges 0-100 and 80-95; the truncated axis on the right magnifies the gap

3D Pie Chart: 3D perspective makes the sectors closer to the front appear larger than they actually are due to the angle, while the rear sectors are compressed, leading to distorted proportion interpretation. To show proportions, flat pie charts or bar charts are more accurate than 3D.

Four sectors each accounting for 25% are presented in 3D and flat pie charts; the 3D version's front sector looks larger than the rear, but the actual proportions are the same

Dual Y-axis: When two sets of data are hung on the left and right Y-axes respectively, the scales of the two axes can be adjusted independently. As long as the right scale is chosen, two lines that are originally unrelated can be made to look like they rise and fall together, implying a correlation that does not exist. To compare two sets of values, use the same baseline to see the true relationship.

Two unrelated time series; the left chart uses dual Y-axes to adjust the scales, making the two lines appear synchronized; the right chart uses a single baseline, revealing they are actually unrelated

Information Overload: Putting too many series or elements in one chart makes it impossible for the reader to grasp the key point. One chart should tell one story.

Descriptive Statistics and Common Statistical Concepts

Data analysis usually starts with descriptive statistics to understand the central tendency, dispersion, and distribution state of the data, and then proceeds to correlation analysis and hypothesis testing.

MetricDescriptionCharacteristics
MeanSum of all values divided by the number of records.Easily affected by extreme values.
MedianThe value in the middle after sorting.Suitable for skewed distributions or when extreme values exist.
ModeThe value that appears most frequently.Can also be used for categorical data.
Standard DeviationMeasuring the degree of data dispersion.The larger the standard deviation, the more dispersed the data.
QuartilesCutting sorted data into four equal parts.Box plots commonly use Q1, Q2, Q3.
Correlation CoefficientMeasuring the direction and strength of linear correlation.No linear correlation does not mean no relationship; correlation does not mean causation.

Common statistical and probability concepts:

  • Conditional Probability: The probability of one event occurring given that another event has occurred.
  • Independent Events: The occurrence of one event does not affect the probability of another event occurring.
  • Confidence Interval: Estimating the possible range of population parameters using an interval.

Bayes' Theorem

A reverse application of conditional probability, using new evidence to update original probability judgments. General conditional probability answers "Given that he is a target customer, what is the probability he clicks the ad?"; Bayes' Theorem answers the reverse question, "Given that he clicked the ad, what is the probability he is a target customer?". In practice, most of the evidence that can be directly observed is the latter (clicks, purchases, revisits), so this reverse deduction is the direction commonly used in marketing and customer analysis.

Let's walk through it with concrete numbers. Suppose target customers account for only 10% of the exposed population, and the click-through rate of this ad for target customers is 60%, while it is only 5% for others. Estimating per 1,000 exposures, 60 clicks are expected from target customers (100 people × 60%) and 45 from others (900 people × 5%), totaling 105. Under the condition "given that someone clicked the ad," the probability that they are a target customer is 60 ÷ 105, about 57.1%. This result depends only on those three ratios (prior 10% and two click-through rates), regardless of how many clicks are actually recorded in the backend.

Hypothesis Testing, p-value, and Type I/Type II Errors

Hypothesis testing usually starts by establishing a null hypothesis, which is a conservative stance of "no difference, no effect," such as "the conversion rate of the new webpage is the same as the old version," and then seeing if the data is sufficient to overturn it.

If there is really no difference in conversion rates between the new and old versions, but such a large or even larger gap appears before our eyes, the probability of this situation occurring is the p-value. When the p-value is lower than the pre-set significance level (e.g., 0.05), it means this result is so low that it doesn't look like something that should happen when there is "no difference," thus overturning the null hypothesis. Note that the p-value is not "the probability that the null hypothesis is true," nor is it the size of the effect; these are two of the most common misinterpretations.

Errors can be made whether overturning or not, divided into two types:

  • Type I Error (False Positive): The null hypothesis is actually true but is overturned, misjudging "no effect" as "having an effect."
  • Type II Error (False Negative): The null hypothesis is actually false but is not overturned, misjudging "having an effect" as "no effect."

And the boundary of this significance level is essentially the upper limit of the Type I error probability one is willing to bear.

t-test, F-test, Chi-square test

Statistical testing uses sample data on hand to judge whether a hypothesis about the population is true; the preceding null hypothesis and p-value are its common judgment logic. Which test to use depends on the data type and the object to be compared. Common ones are the following three.

TestObject of Comparison/VerificationTypical Question
t-testMean of one to two groupsIs there a difference in the average transaction amount between two groups of customers?
F-testVariance of two groups (degree of data dispersion)Is the quality fluctuation of two production lines consistent?
Chi-square testWhether there is a correlation between two categorical variablesIs there a correlation between gender and whether they purchased?

ANOVA and F-test

To compare the means of three or more groups (e.g., whether the average revenue of three stores is the same), Analysis of Variance (ANOVA) is used, and its test statistic is the F-value. It is called "Analysis of Variance" but compares means because it compares "between-group variance" and "within-group variance." When the between-group difference is significantly larger than the random fluctuation within the group, it is judged that the means of the groups are not all equal.

Empirical Rule and Chebyshev's Theorem

Both answer the same question, "how much data is covered within ±k standard deviations of the mean," the difference being the requirement for the shape of the data.

  • The Empirical Rule only applies to data that is approximately normal (bell-shaped symmetric). Because the shape is known, the estimate can be very precise; about 68%, 95%, and 99.7% of the data is covered within ±1, 2, and 3 standard deviations, respectively.
  • Chebyshev's Theorem does not require any distribution shape. Because the shape is uncontrollable, it can only give a conservative lower bound; at least 11k2 of the data falls within ±k standard deviations of the mean.

The common 75% is calculated by substituting k=2, 1122=34; k=3 is 1132=89, about 88.9%.

So, for the same ±2 standard deviations, when data is approximately normal, one can say "about 95%"; when the shape is unknown or clearly skewed, one retreats to citing Chebyshev and says "at least 75%." The former is a precise estimate gained from a beautiful shape, and the latter is a minimum guarantee gained from not picking a shape.

Chebyshev's Theorem Calculation Example

The theorem can be used in two directions.

Given several standard deviations, find the minimum proportion: A factory's daily output averages 500 units, standard deviation 20 units. How much data is covered by at least 460 to 540 units? 460 and 540 are both 40 units from the mean, which is exactly 2 standard deviations (k=2). Substituting into 1122=34, so at least 75%.

Given the proportion to be guaranteed, find the interval: A distribution center averages 40 minutes for delivery, standard deviation 8 minutes. To guarantee that at least 84% of orders fall within a symmetric interval, how large should the interval be? Solving 11k2=0.84 gives k2=6.25, k=2.5. The interval is 40±2.5×8=[20,60] minutes.

Normal Distribution and Skewness

The shape of the data distribution affects the choice of central tendency metrics and analysis methods.

Distribution TypeShapeRelationship between Mean, Median, ModeTypical Example
Symmetric (Normal)Bell-shaped symmetricThe three coincide or are very closeHeight, weight
Right-skewed (Positive)Long tail on the right, peak on the leftMode < Median < MeanIncome, housing prices, consumption amount
Left-skewed (Negative)Long tail on the left, peak on the rightMean < Median < ModeRetirement age, high-score group in exams

When data is clearly skewed, the mean is pulled by extreme values at the tail, and the median is usually a more representative central tendency metric.

Data Mining and Machine Learning

Data mining is finding patterns, correlations, and knowledge that can support decision-making from large amounts of data. Common methods include decision trees, cluster analysis, association rule mining, and machine learning.

MethodFocusCommon Application
Decision TreeUsing a series of judgment conditions for classification or prediction.Customer churn prediction, credit risk classification
Cluster AnalysisDividing similar data into groups.Market segmentation, customer segmentation, product portfolio
Association Rule MiningFinding relationships where items often appear together.Market basket analysis, cross-selling
Regression ModelPredicting continuous numerical targets.Sales amount prediction, demand prediction
Classification ModelPredicting discrete categorical targets.Whether to churn, whether to default, whether to purchase

Machine learning can be divided into three categories.

TypeData CharacteristicsGoalExample
Supervised LearningLabeled data.Learning the mapping from input to answer.Classification, regression
Unsupervised LearningNo manually labeled answers.Exploring potential structures in data.Clustering, dimensionality reduction, association exploration
Reinforcement LearningLearning through action and feedback.Finding strategies that accumulate better rewards.Dynamic pricing, path planning, resource scheduling
Diagram

Regression predicts continuous values (e.g., tomorrow's sales), and classification predicts discrete categories (e.g., whether a customer will churn). Both belong to supervised learning, the difference being the type of target variable.

Data Splitting and Overfitting

When building a model, data is usually split into three parts to prevent the model from "rote memorizing" training data but failing to generalize to new data.

SplitEnglishPurpose
Training SetTraining SetLetting the model learn parameters from it, about 60-70%.
Validation SetValidation SetAdjusting hyperparameters, selecting model versions, about 15-20%.
Test SetTest SetFinal evaluation of generalization ability, used only once, about 15-20%.
PhenomenonEnglishCharacteristicsCountermeasure
OverfittingOverfittingGood performance on training set, poor on test set; model rote memorizes noise.Regularization, simplifying model, increasing data, early stopping, cross-validation.
UnderfittingUnderfittingPoor performance on both training and test sets; model is too simple.Increasing features, increasing model complexity, reducing regularization.

Cross-Validation splits training data into K parts, each time using 1 part for validation and the remaining K-1 parts for training, averaging the results after repeating K times, commonly K=5 or K=10. It can estimate model performance more robustly when data volume is limited.

Regression Model Evaluation Metrics

Regression models commonly use several metrics to measure prediction error, the difference being sensitivity to "large errors" and ease of interpretation.

MetricFull NameFocus
MAEMean Absolute ErrorAverage of absolute errors; treats every error equally, less affected by extreme values.
RMSERoot Mean Squared ErrorError is squared then square-rooted; magnifies large errors, used when extreme prediction errors must be severely punished.
MAPEMean Absolute Percentage ErrorExpresses error as a percentage; can compare across different magnitudes, but distorts when actual values are close to 0.
Coefficient of DeterminationProportion of overall variance explained by the model; looks at overall fit, not a penalty for single-record errors.

Classification Model Evaluation Metrics

Evaluation of classification models starts from the Confusion Matrix, dividing prediction results into four cells: True Positive (TP), False Positive (FP), True Negative (TN), False Negative (FN), and then calculating metrics from them.

MetricEnglishAlgorithmQuestion Answered
AccuracyAccuracy(TP + TN) ÷ TotalHow many were predicted correctly overall? Distorts when classes are unbalanced.
PrecisionPrecisionTP ÷ (TP + FP)Of those predicted positive, how many are actually positive?
RecallRecallTP ÷ (TP + FN)Of those actually positive, how many were caught?
F1 ScoreF1 ScoreHarmonic mean of precision and recallComprehensive metric when both need to be balanced.

Precision and recall usually pull against each other, with trade-offs based on error costs. For example, fraud detection has a high cost for missing, so it favors recall; marketing list mis-sending has a low cost but harasses customers, so it favors precision.

CRISP-DM Data Mining Standard Process

CRISP-DM (Cross Industry Standard Process for Data Mining) is a data mining methodology widely adopted in the industry, containing six stages, which can backtrack to the previous stage as needed.

Diagram

Data Mining Process Methodology Comparison

Besides CRISP-DM, KDD (Knowledge Discovery in Databases) and SEMMA are also common data mining processes. KDD appeared earlier and is more academic; SEMMA was proposed by SAS, with the name taken from the initials of the five stages: Sample, Explore, Modify, Model, Assess. The goals of the three are similar, the difference being in stage division and orientation.

MethodologyStagesOrientation
CRISP-DMBusiness Understanding → Data Understanding → Data Preparation → Modeling → Evaluation → DeploymentBusiness-oriented; starts with business goals, ends with deployment.
KDDSelection → Preprocessing → Transformation → Data Mining → Interpretation and EvaluationData technology perspective, academic-oriented.
SEMMASample → Explore → Modify → Model → AssessTool operation level, lacks business understanding and deployment stages.

Business Management Basic Knowledge

Business Environment and Strategic Management

Enterprises must evaluate the environment because organizational strategy, organizational structure, and market performance are all affected by the environment. Chandler's linkage theory can be simplified as "Environment → Strategy → Structure."

PerspectiveDescription
Strategic PerspectiveOrganizations adopt different strategies in stages such as startup, growth, maturity, and decline.
Market PerspectiveOrganizations turn inputs into outputs and must master the market to survive and grow.
Competitive PerspectiveEnterprises must provide outputs more attractive than competitors and continue to profit.

The enterprise environment can be divided into internal and external environments.

TypeFactors
Internal EnvironmentShareholders, board of directors, organizational culture, organizational structure, employee attitudes and values, management procedures and methods
Specific EnvironmentCustomer groups, suppliers, competitor groups, financial institutions, shareholder groups, government, pressure groups
General EnvironmentEconomy, politics and law, social culture, technology, population structure, natural ecology, internationalization

PEST and PESTEL Analysis

PEST/PESTEL is a common framework for analyzing the general environment, systematically taking stock of external influencing factors from multiple aspects.

AbbreviationAspectContent
PPoliticalGovernment policies, political stability, trade agreements.
EEconomicGDP, interest rates, exchange rates, inflation, consumption power.
SSocialPopulation structure, culture, lifestyle, education level.
TTechnologicalTechnology trends, R&D investment, automation, intellectual property.
EEnvironmentalClimate change, sustainability regulations, energy, pollution (added in PESTEL).
LLegalLabor laws, consumer protection, fair trade, industry regulations (added in PESTEL).

PEST is the original version, and PESTEL (also called PESTLE) adds environmental and legal aspects, responding to the increasing importance of sustainable development and regulatory compliance issues.

Competition and Strategy Analysis Frameworks

Porter's Five Forces analysis is used to evaluate industry competitive pressure. All five forces are related to the enterprise's specific environment.

Five ForcesCore QuestionSituation where this force is stronger
Supplier Bargaining PowerCan suppliers raise prices or lower supply conditions?Suppliers are concentrated, few alternative sources, high switching costs, suppliers can integrate forward.
Customer Bargaining PowerCan customers lower prices or demand more services?Buyers are concentrated or purchase in large volumes, products are standardized, low switching costs, buyers can integrate backward.
Threat of New EntrantsIs it easy for new entrants to enter the market?Low entry barriers, such as economies of scale, brand, capital, channels, and regulatory barriers are all not high.
Threat of SubstitutesCan other products or services replace existing needs?Substitutes have high price-performance ratios, low customer switching costs.
Rivalry Among Existing FirmsIs competition within the industry intense?Many competitors of similar strength, industry growth is slowing, fixed or exit barriers are high, product differences are small.
Diagram

The four arrows in the diagram point to the central rivalry among existing firms because the other four forces affect the intensity of competition within the industry. However, rivalry among existing firms is also one of the five forces; the five together determine the competitive intensity and profit space of the entire industry.

Porter's Three Competitive Strategies

After completing the Five Forces analysis, enterprises can choose corresponding competitive strategies based on the industry situation.

StrategyEnglishFocusSuitable Scenario
Cost LeadershipCost LeadershipProviding the lowest cost at the same quality, winning through economies of scale and efficiency.Mass market, price-sensitive customers.
DifferentiationDifferentiationProviding products or services with unique value, customers are willing to pay a premium.Brand, design, technology, or service has differentiation potential.
FocusFocusTargeting a specific market segment, adopting cost or differentiation strategies within that segment.Niche market, specific group, or geographic area.

SWOT analysis is used to take stock of internal and external conditions.

TypeSourceDescription
StrengthsInternalFavorable conditions possessed by the organization.
WeaknessesInternalDeficiencies or limitations within the organization.
OpportunitiesExternalFavorable factors in the external environment.
ThreatsExternalUnfavorable factors in the external environment.

TOWS Matrix (Strategic Deduction of SWOT)

SWOT only takes stock of the status quo; the TOWS Matrix further crosses internal and external factors to produce four strategic directions.

StrategyCombinationDirection
SO Strategy (Offensive)Strengths + OpportunitiesUse internal strengths to seize external opportunities.
WO Strategy (Reinforcement)Weaknesses + OpportunitiesReinforce internal weaknesses to seize opportunities.
ST Strategy (Defensive)Strengths + ThreatsUse strengths to resolve external threats.
WT Strategy (Retreat)Weaknesses + ThreatsDownsize, transform, or exit to avoid double pressure.

The BCG Matrix analyzes business units using market growth rate and relative market share.

TypeMarket Growth RateRelative Market ShareManagement Implication
StarHighHighNeeds investment to maintain growth.
Cash CowLowHighMaturity stage, stable cash flow.
Question MarkHighLowNeed to evaluate whether to increase investment to expand market share, otherwise abandon/exit.
DogLowLowUsually need to downsize, transform, or exit.
Diagram

Ansoff Growth Matrix

The Ansoff Matrix deduces four growth strategies from the new/old combinations of the two dimensions of "product" and "market."

StrategyProduct × MarketDescription
Market PenetrationExisting Product × Existing MarketIncrease purchase frequency of existing customers or market share.
Market DevelopmentExisting Product × New MarketExpand existing products to new regions or new customer groups.
Product DevelopmentNew Product × Existing MarketLaunch new products or new features for existing customers.
DiversificationNew Product × New MarketHighest risk, stepping into new fields.

Manufacturing vs. Service Industries

AspectManufacturingService
OutputTangible products.Intangible services.
Management FocusCost, quality, customization, and production efficiency.Customer experience, process quality, and service consistency.
Quality RecognitionEasier to measure with objective specifications.Often affected by interaction context and customer subjective feelings.
Customer ParticipationCustomers are usually not present during production.Service provision and consumption often occur simultaneously.

Service industries have four common characteristics:

  • Intangibility: Services cannot be touched like physical products.
  • Inseparability: Service provision and consumption often occur simultaneously.
  • Variability: Service quality is affected by personnel, customers, and context.
  • Perishability: Unused service capacity cannot be stored.

Marketing and Product Management

Marketing is about creating exchange value, satisfying customer needs, and supporting enterprise profit. Common themes include marketing mix, consumer behavior, purchasing procedures, market segmentation, and product lifecycle.

Marketing Mix 4Ps

4PDescription
ProductGoods, services, brand, and added value provided to customers.
PricePricing strategy, discounts, and payment terms.
PlaceHow the product reaches the customer.
PromotionAdvertising, sales promotion, public relations, and sales communication.

4Ps Corresponding to 4Cs (Customer Perspective)

4Ps are from the enterprise perspective; 4Cs were proposed by Robert Lauterborn, reinterpreting the marketing mix from the customer's angle.

4P4CPerspective Conversion
ProductCustomer ValueFrom "what product we make" to "what value the customer needs."
PriceCostFrom "how much to price" to "total cost paid by the customer."
PlaceConvenienceFrom "where to stock" to "is it convenient for the customer to obtain."
PromotionCommunicationFrom "one-way advertising" to "two-way communication."

STP Marketing Strategy

STP is the core process of marketing strategy, connecting market analysis and marketing mix design.

StepEnglishDescription
SegmentationSegmentationDividing the market into different segments based on geographic, demographic, psychological, behavioral, and other variables.
TargetingTargetingSelecting the target customer group the enterprise wants to serve from the segments.
PositioningPositioningEstablishing a differentiated brand impression and value proposition in the minds of the target customer group.

Purchaser Decision Consumption Procedure

  1. Need generation.
  2. Information collection.
  3. Evaluation of need solutions.
  4. Purchase decision.
  5. Post-purchase satisfaction evaluation.

Factors Influencing Purchasing Decisions

Consumer purchasing decisions are affected by multi-level factors, and marketing strategies need to be designed based on the dominant factors of the target customer group.

LevelFactors
Cultural FactorsCulture, subculture, social class.
Social FactorsReference groups, family, roles, and status.
Personal FactorsAge, occupation, economic status, lifestyle, personality.
Psychological FactorsMotivation, perception, learning, beliefs, and attitudes.

Product Lifecycle

StageCharacteristicsManagement Focus
IntroductionMarket acceptance is still low, sales volume is limited.Build awareness, lower entry barriers.
GrowthSales volume increases rapidly, new users join, profit climbs simultaneously.Expand market, increase market share.
MaturityMarket is stable, sales reach peak, but competition intensifies, profit has passed its peak and started to decline.Control costs, differentiation, maintain cash flow.
DeclineDemand drops, product may be replaced.Decide to eliminate, transform, or maintain niche market.

Plotting the sales volume and profit curves for the four stages shows that the profit peak usually occurs earlier than the sales volume peak. In the latter part of the growth stage, competition is not yet intense, and costs have decreased with scale expansion, so profit hits the top first; after entering the maturity stage, although sales volume climbs to the highest, competition intensifies, prices and gross margins decline, and profit has already started to fall back.

Product Lifecycle Curve

Product Diffusion: Types of Innovation Adopters (Rogers' Diffusion of Innovations)

Everett Rogers divided consumers into five types based on the timing of adopting new products, presenting a normal distribution.

TypeEnglishProportionCharacteristics
InnovatorsInnovators2.5%Willing to try new things, tolerate high risk and imperfection.
Early AdoptersEarly Adopters13.5%Opinion leaders, influence subsequent group adoption decisions.
Early MajorityEarly Majority34%Adopt after deep thought, value practicality and word-of-mouth.
Late MajorityLate Majority34%Follow up only after the majority have adopted, more conservative.
LaggardsLaggards16%Tradition-oriented, accept or reject adoption last.

Research & Development and Human Resources

Research & Development (R&D) is an important activity for enterprises to maintain innovation and competitiveness, containing two major aspects: research and development.

TypeDescription
Basic ResearchDeveloping and verifying theories; results may or may not have practical application value.
Applied ResearchTurning basic research results toward concrete applications and problem-solving.
Development and Engineering DesignSystematically applying knowledge and research results to products or services.
Experimental ProductionSmall-batch trial production to confirm design, molds, materials, quality, and delivery issues.

R&D has non-routine nature and high risk. Investing resources does not guarantee the generation of patents, products, or market success, and one must face tests of mass production, marketing, channels, and customer acceptance.

Human resource management includes activities such as recruitment and selection, training and development, personnel movement, performance evaluation, rewards, and human resource retention.

ActivityFocus
Human Resource PlanningUnderstanding existing human resource supply, evaluating future demand, planning supply-demand gaps.
SelectionFinding suitable candidates through resumes, interviews, written tests, and background checks.
Employee TrainingStimulating self-learning, practice-oriented, employee participation, real-time feedback.
Performance EvaluationIdentifying performance problems, consequences, and improvement plans.
Compensation and BenefitsIncludes monetary rewards and non-monetary rewards.

Accounting, Finance, and Financial Metrics

The difference between accounting and financial management lies in the time perspective and work focus.

ItemAccountingFinancial Management
Time PerspectiveFocused on historical information organization and reporting.Focused on future fund planning and financial decisions.
Work FocusRecording, aggregating, reporting transaction activities.Investment, financing, fund utilization, and risk management.
UsersInternal managers and external information users.Managers, investors, creditors, etc.
MetricTypeDescription
Marginal Profit RateProfitabilityMeasuring how much profit is left per unit of sales.
Return on Investment (ROI)ProfitabilityMeasuring the return brought by investment.
Earnings Per Share (EPS)ProfitabilityEarnings corresponding to each share of common stock.
Current RatioSolvencyMeasuring short-term debt repayment ability, not a profitability metric.
Inventory TurnoverOperational EfficiencyMeasuring the speed of inventory conversion.

EOQ Economic Order Quantity Formula

Given annual demand D, ordering cost per order S, and holding cost per unit H, the economic order quantity is:

EOQ=2DSH

EOQ is the single order quantity that minimizes the sum of "ordering costs" and "holding costs," assuming stable demand, fixed lead time, and immediate replenishment.

Management Activities and Organization

Management is the process by which managers use organizational resources to achieve goals.

Achieving goals and using resources correspond exactly to the two aspects of measuring management performance:

  • Effectiveness: Whether the goal is appropriate, whether the right things are done.
  • Efficiency: The relationship between input and output, whether results are achieved with fewer resources.

Managers can be divided into first-line managers, middle managers, and top managers by level.

Diagram

Different levels require different proportions of skills. First-line managers focus on technical skills, top managers focus on conceptual skills, and interpersonal skills are important at all levels.

Manager SkillsDescriptionCommonly Found In
Technical SkillsUsing professional knowledge to perform work.First-line managers need more.
Interpersonal SkillsCommunication, cooperation, motivation, and leadership.All levels need.
Conceptual SkillsAnalyzing problems, evaluating plans, planning actions.Top managers need more.
Political SkillsBuilding power bases and interest linkages.Important when involving cross-departmental coordination.

Mintzberg's managerial roles can be divided into three categories.

TypeRole
Interpersonal RolesFigurehead, leader, liaison
Informational RolesMonitor, disseminator, spokesperson
Decisional RolesEntrepreneur, disturbance handler, resource allocator, negotiator

Four functions of enterprise management:

FunctionDescription
PlanningDefining goals, formulating strategies and plans.
OrganizingAllocating resources, arranging work, and establishing responsibility relationships.
LeadingMotivating, communicating, coordinating, and handling conflicts.
ControllingMonitoring performance, comparing goal and actual gaps, taking corrective measures.

PDCA Cycle (Deming Cycle)

PDCA is the foundation cycle for quality management and continuous improvement, proposed by Walter Shewhart and promoted by W. Edwards Deming, often regarded as the execution model of the four management functions.

Diagram
StageCorresponding Management FunctionFocus
PlanPlanningDefining problems, setting goals, formulating action plans.
DoOrganizing + LeadingExecuting according to plan, recording processes and data.
CheckControllingEvaluating effectiveness, identifying gaps with goals.
ActControlling + PlanningIf successful, standardize; if not, correct and restart the cycle.

The extended version PDSA (Plan-Do-Study-Act) replaces "Check" with "Study," emphasizing in-depth analysis rather than simple checking. Six Sigma's DMAIC (Define, Measure, Analyze, Improve, Control) is an advanced version more focused on quality improvement projects.

Digital Enterprise Information Tools Basic Knowledge

Digital Enterprise and Digital Transformation

A digital enterprise is one that can use digital technology and information networks to communicate, collaborate, conduct electronic transactions, share data, and transform processes with customers, suppliers, and business partners. The focus of digital transformation is not just introducing tools, but changing operational processes, service delivery methods, and value creation models.

Diagram

Operational Intelligence Information Technology

TechnologyFocusOperational Intelligence Application
AISupporting analysis and decision-making through machine learning, deep learning, and NLP.Customer behavior prediction, inventory risk analysis, social media sentiment analysis.
Generative AIAutomatically generating text, images, and code using Large Language Models (LLM) and diffusion models.Marketing copy, customer service dialogue, report summary, SQL auto-generation, knowledge Q&A.
Cloud ComputingObtaining computing, storage, and platform resources on-demand via the network.Expanding data platforms, reducing peak resource procurement pressure.
RFIDTracking and identifying objects through radio frequency signals.Material tracking, inventory counting, automated warehouse entry/exit.
IoTConnecting sensors, equipment, and networks.Equipment monitoring, smart logistics, smart agriculture, environmental monitoring.
Big DataProcessing data with high volume, velocity, variety, and uncertainty.Marketing analysis, production risk analysis, customer insight.

Big Data 5V Characteristics

VChineseDescription
Volume量大Data volume reaches TB, PB levels.
Velocity速度快High frequency generation and processing requirements.
Variety多樣性Structured, semi-structured, unstructured coexist.
Veracity真實性Data quality and credibility vary.
Value價值Finding parts in massive data that can be converted into business decisions.

The iPAS official study guide lists 4V (Volume, Velocity, Variety, Veracity) for big data characteristics; Value is an extension common in other textbooks, collectively called 5V.

AI / ML / DL / GenAI Scope Relationship

Diagram

The diagram is a simplified scope inclusion relationship. ML is a subset of AI, DL is a subset of ML, and the current mainstream GenAI is also implemented using deep learning.

There are three standard cloud service models defined by NIST: IaaS, PaaS, and SaaS; the table below also lists FaaS, which is common in serverless architectures, though it is not one of the three NIST standards.

Service ModelFull NameSupplier Management ScopeCustomer Management ScopeTypical Service
IaaSInfrastructure as a ServicePhysical machine, network, storage, virtualizationOS, middleware, execution environment, application, dataAWS EC2, Azure VM, GCP Compute Engine
PaaSPlatform as a ServiceIaaS scope + OS, middleware, execution environmentApplication, dataHeroku, Azure App Service, AWS Elastic Beanstalk
FaaSFunction as a ServicePaaS scope + execution environment management, auto-scalingFunction code, trigger settingsAWS Lambda, Azure Functions, GCP Cloud Functions
SaaSSoftware as a ServicePaaS scope + applicationData (user operation level)Gmail, Microsoft 365, Salesforce

NIST also defines four deployment models, describing who the cloud infrastructure is for:

Deployment ModelDescription
Public CloudOperated by cloud service providers, open for general public rental, multi-tenant shared resources.
Private CloudDedicated to a single organization, can be self-built or outsourced, highest control and isolation.
Community CloudShared by multiple organizations with common needs (e.g., regulations, security).
Hybrid CloudCombining two or more deployment models, e.g., using private cloud normally, scaling to public cloud during peaks.

RFID systems include electronic tags, card readers, and application systems. Electronic tags are divided into active and passive:

TypePower SourceCharacteristics
Active RFIDBuilt-in battery.Can actively send data, sensing range is larger.
Passive RFIDNo built-in battery.Generates current through electromagnetic induction after receiving card reader signals, sensing range is smaller.

IoT three-layer architecture:

LevelDescription
Perception LayerEmbedding components with sensing, identification, and communication capabilities into objects.
Network LayerReceiving information and processing it through wired or wireless transmission.
Application LayerCombining data with industry scenarios to provide specific services.
Diagram

Organizational Levels of Information Systems

Enterprise information systems can be classified by the organizational level they serve, forming a pyramid structure from base-level daily transactions to high-level strategic decision-making. The higher the level, the more aggregated the data and the more unstructured the decisions faced.

Diagram
SystemChineseEnglish Full NameService LevelFocus
TPSTransaction Processing SystemTransaction Processing SystemOperationalRecording and processing daily operational transactions, such as orders, shipping, invoicing, payroll.
MISManagement Information SystemManagement Information SystemManagementAggregating TPS transaction data, generating routine management reports and summaries.
DSSDecision Support SystemDecision Support SystemManagementSupporting semi-structured decision-making with interactive models and analysis tools.
EISExecutive Information SystemExecutive Information SystemStrategicIntegrating internal/external information, supporting high-level strategic decisions with dashboards (also called ESS).
ESExpert SystemExpert SystemCross-levelOrganizing specific domain expert knowledge into a rule base to assist judgment and diagnosis.

Structured, Semi-structured, and Unstructured Decisions

Decisions are divided into three types based on "whether there is a clear processing procedure," which is also the basis for information system layering:

  • Structured decisions: Clear rules and steps, repeatable, can even be automated, such as inventory replenishment, payroll calculation. Mostly handled by TPS/MIS.
  • Semi-structured decisions: Partially have rules, partially require human judgment, such as budget compilation, marketing resource allocation. This is the home field of DSS.
  • Unstructured decisions: Problems are novel, no ready-made procedures, highly dependent on experience judgment, such as new business investment, mergers and acquisitions. Mostly fall into the EIS at the strategic level.

The higher up the pyramid, the more unstructured the decisions become.

TPS is the data foundation for other systems; MIS and DSS both rely on transaction data accumulated by TPS. The difference between MIS and DSS is that the former generates fixed-format routine reports to answer "what happened," while the latter performs interactive analysis through models to answer "what if we do this." The functional systems in the next section, such as ERP, CRM, and SCM, span multiple levels, containing both transaction processing and report analysis capabilities internally.

Common Information Systems for Digital Enterprises

SystemChineseEnglish Full NameCore Purpose
ERPEnterprise Resource PlanningEnterprise Resource PlanningIntegrating internal enterprise finance, manufacturing, production, sales, HR, and other processes and data.
MRPMaterial Requirements PlanningMaterial Requirements PlanningCalculating raw material procurement and production needs based on production schedules and bills of materials.
MESManufacturing Execution SystemManufacturing Execution SystemConnecting production schedules and on-site equipment, real-time grasp of work orders, output, and quality.
PLMProduct Lifecycle ManagementProduct Lifecycle ManagementManaging data and processes of products from design, R&D, mass production to elimination.
SCMSupply Chain ManagementSupply Chain ManagementManaging supply chain upstream and downstream planning and execution.
CRMCustomer Relationship ManagementCustomer Relationship ManagementManaging customer data, interactions, sales channels, and service records.
ECElectronic CommerceElectronic CommerceSupporting electronic transactions, information sharing, and relationship maintenance.
KMKnowledge ManagementKnowledge ManagementSupporting storage, retrieval, creation, transfer, and application of organizational knowledge.
BIBusiness IntelligenceBusiness IntelligenceCollecting, integrating, analyzing, and presenting data to support decision-making.
RPARobotic Process AutomationRobotic Process AutomationAutomatically executing tasks with clear rules and high repetition using software robots.
HRISHuman Resource Information SystemHuman Resource Information SystemManaging recruitment, attendance, payroll, assessment, and personnel data.

ERP

ERP is a finance and accounting-oriented integrated information system used to plan, control, and integrate enterprise resources and information from order taking, manufacturing, shipping to settlement reports. It usually consists of integrated software modules and a centralized database.

Common ERP modules include:

  • Finance and Accounting.
  • Human Resources.
  • Manufacturing and Production.
  • Sales and Marketing.
  • Procurement and Inventory.
  • Supply Chain Management.

Manufacturing Information Systems: MRP, MES, and PLM

There are several information systems in the manufacturing field that often appear together, dividing labor and collaborating. ERP evolved from MRP (MRP → MRP II → ERP); MES and PLM are complementary systems connected to ERP, focusing on on-site execution and product/engineering data management respectively.

SystemPositioningRelationship with ERP
MRPCalculating material procurement and production schedules based on master production schedules and bills of materials (BOM).The predecessor of ERP, later expanded to MRP II covering capacity and finance, then evolved into ERP.
MESConnecting ERP production plans and on-site equipment, real-time reporting of work order progress, output, and quality.Fills the execution gap between ERP and on-site machines, and feeds actual production data back to ERP.
PLMManaging engineering data and processes of products from design, R&D, mass production to elimination.Provides product master data and BOM sources for ERP and MES to reference.

MRP solves "when to prepare how much material," MES solves "how the site is actually performing," and PLM solves "how to manage product data itself." After connecting with ERP, they constitute a complete information chain for manufacturing from order taking, material preparation, production to shipping.

"Predecessor" means absorbed, not replaced

In information systems, saying MRP is the "predecessor" of ERP means that MRP's functions are covered by the broader ERP and become one of its modules, not that MRP disappeared after being replaced by another system. Therefore, ERP itself contains MRP's material calculation functions, and independent MRP systems are almost never seen on the market.

MES and PLM coexist with ERP because the three have different scopes and levels, each performing its own duties, exchanging data through interfaces rather than replacing each other. So ERP and MRP have an "inclusion" relationship, while ERP and MES/PLM have a "partner" relationship.

SCM

The supply chain includes participants such as suppliers, manufacturers, retailers, and logistics in the process of products or services from raw materials to delivery to customers.

SCM systems can be divided into supply chain planning and supply chain execution:

  • Supply Chain Planning: Demand forecasting, material requirements, production planning, logistics distribution planning.
  • Supply Chain Execution: Managing the flow of products from distribution centers to warehouses and customers.

Supply chains can be divided into three types based on "what triggers production":

TypeProduction TriggerTrade-off
PushProduce based on demand forecasts, then push to channels.Inventory ready, fast shipping, but forecast errors cause surplus or shortages.
PullProduce or assemble only after receiving actual orders.Low inventory, no overproduction, but customer waiting lead time is longer.
Push-PullFront end pushes based on forecasts, back end pulls based on orders.Combines the efficiency of push with the flexibility of pull.

The key to push-pull is the "push-pull boundary point." Before the boundary point, common parts or semi-finished products are prepared based on forecasts (push), and after receiving orders, final assembly or customization is done according to customer specifications (pull). For example, computer manufacturers prepare standard parts, and after receiving orders, assemble them according to configurations; the front end enjoys scale efficiency, and the back end retains customization flexibility.

Electronic Commerce

Electronic commerce is business activity supported by ICT to share information, execute transactions, and maintain relationships between buyers and sellers.

TypeFull NameDescription
B2BBusiness to BusinessBusiness to business, both buyers and sellers are businesses, such as procurement platforms between enterprises and suppliers, raw material trading platforms.
B2CBusiness to ConsumerEnterprises sell goods or services directly to consumers, such as online bookstores, brand official e-commerce.
C2BConsumer to BusinessConsumer-led, enterprise-responsive, such as group buying to gather bargaining power, consumers initiate needs for enterprises to undertake.
C2CConsumer to ConsumerTransactions between consumers, platforms mediate, such as auction sites, second-hand trading platforms.
B2EBusiness to EmployeeEnterprises provide services, information, and goods to employees through internal platforms, such as employee portals, employee welfare or shopping platforms, online education and training.
O2OOnline to OfflineOnline traffic or ordering, offline consumption or pickup, such as using online coupons in physical stores, food delivery and reservations.

E-commerce trends include socialization, mobilization, and localization. The positioning function of mobile devices allows service providers to provide location services closer to the context.

KM

Knowledge management systems use information technology to support the storage, retrieval, creation, transfer, and application of organizational knowledge.

ComponentDescription
Knowledge BaseDocument libraries and information libraries, supporting capturing, organizing, storing, searching, and accessing knowledge.
Knowledge MapGraphically presenting knowledge sources, storage locations, expert locations, tasks, and knowledge relationships.

The two have different divisions of labor. The knowledge base is a content warehouse for storing knowledge, while the knowledge map is an index for finding knowledge, marking where knowledge is scattered, who holds it, and pointing to the knowledge base and other sources (including tacit knowledge in people's minds).

Knowledge maps are divided into three types based on how they organize organizational knowledge:

TypeOrganize Knowledge By
Conceptual Knowledge MapClassified by concept or theme, presenting the hierarchy of knowledge and relationships between concepts.
Process Knowledge MapCorresponds to enterprise processes, marking which knowledge is needed or produced by each process activity.
Competency Knowledge MapCorresponds to personnel capabilities, presenting where experts are and the distribution of organizational knowledge capabilities.

BI

BI collects, integrates, analyzes, and presents enterprise data to support decision-making. Its operating mechanism can be divided into three levels:

  • Data integration and cleaning: Aggregating sources such as ERP, CRM, external data, etc., and standardizing them.
  • Multi-dimensional analysis and visualization: Converting data into charts, dashboards, or interactive reports.
  • Real-time monitoring and decision support: Providing alerts, scenario simulation, and KPI tracking.

CRM

CRM centers on customers, integrating data and processes of marketing, sales, and service. Main functions are divided into three categories:

  • Data integration and management: Centralized integration of customer contact information, purchase records, service records, and interaction history.
  • Marketing and sales support: Managing potential customers, sales pipelines, and marketing automation.
  • Service and interaction management: Recording customer service interactions, providing knowledge bases, and automated replies.

Three Types of CRM

TypeFocus
OperationalAutomating front-line marketing, sales, and service processes, facing customers directly.
AnalyticalAnalyzing customer data collected by operational CRM to do segmentation, value evaluation, and behavior prediction.
CollaborativeIntegrating customer interactions and communication across multiple channels such as phone, web, email.

Customer Lifetime Value (LTV)

Customer Lifetime Value (LTV) estimates the total profit a customer can bring during the entire period of interaction, which is the core metric for analytical CRM to measure customer value. Under the simplified model of subscription systems, LTV = Monthly Marginal Profit × Expected Duration in Months, where the expected duration in months can be estimated by the reciprocal of the monthly Churn Rate (1 ÷ Monthly Churn Rate).

LTV Calculation Example

A SaaS service has an average revenue per user (ARPU) of $30 per month, a marginal gross margin of 80%, and a monthly churn rate of 4%. The expected duration is 1 ÷ 0.04 = 25 months, the monthly marginal profit is 30 × 80% = $24, so LTV = 24 × 25 = $600.

RPA

RPA uses software robots to simulate human operations on system interfaces, automatically executing tasks with clear rules and high repetition. It is suitable for tasks such as data entry, file movement, report generation, and data conversion between systems. It is not equivalent to AI, but can be combined with AI to handle more complex document recognition and judgment processes.

Digital Transformation and Value Creation

A business model (BM) describes how an enterprise establishes and uses resources to provide valuable products or services to customers, thereby obtaining profits and creating enterprise value.

AspectDescriptionElements
Value PropositionValue commitment made by the enterprise to the target customer group.Target customer group, product or service, customer value
Value ConfigurationHow internal and external resources form value creation processes.Stakeholder network, key activities, customer relationships, distribution channels
Value StructureOrganizational, technical, and capability foundations supporting the business model.Organizational structure, organizational culture, ICT, core resources, core capabilities
Value FinanceCost, pricing, revenue, and profit logic.Cost structure, pricing model, revenue structure, potential profit

The enterprise Value Chain is a combination of activities from raw material acquisition to product or service delivery to customers. Porter's value chain divides activities into primary activities and support activities.

TypeDefinitionIncluded Activities
Primary ActivitiesDirectly participate in the production and delivery of products or services, adding value to products link by link along the process.Inbound logistics, operations and production processing, outbound logistics, marketing and sales, after-sales service
Support ActivitiesDo not directly produce products, but span and support all primary activities, providing the resources and foundations they need.Procurement, human resource management, technology development, enterprise infrastructure

Simply put, primary activities are the linear process of products from material entry to after-sales, while support activities are the common foundation that horizontally supports this process. In the diagram below, primary activities are connected from left to right, and support activities span across them, which is exactly this structure.

Diagram

BPR Business Process Re-engineering

Business Process Re-engineering (BPR) is rethinking the fundamental basis of an enterprise, thoroughly renovating operational processes, so that key performance indicators obtain significant improvement.

Four points of BPR:

PointDescription
FundamentalRe-asking why the enterprise does this thing and why it does it this way.
RadicalRedesigning processes from the ground up, not limited by existing rules.
DramaticPursuing significant improvement, not just small incremental improvement.
ProcessFocusing on cross-departmental operational processes, not just looking at organizational structure.

The five-step model proposed by scholars Davenport and Short:

  1. Establish enterprise vision and activity goals.
  2. Identify processes that need redesign.
  3. Understand existing processes.
  4. Determine the function of information technology.
  5. Build a prototype of the new process.

The eight-step model proposed by scholars Gunasekaran et al.:

  1. Educate employees to recognize BPR.
  2. Select and establish a BPR promotion team.
  3. Evaluate existing processes and initially establish processes that need re-engineering.
  4. Determine measurement methods.
  5. Design re-engineered processes.
  6. Education and training.
  7. Simulation testing.
  8. Officially introduce new processes.

Change Management: Lewin's Three-Stage Model

For significant changes like BPR to be successfully implemented, supporting change management is required. Kurt Lewin proposed the most classic three-stage change model.

StageEnglishFocus
UnfreezingUnfreezingEstablishing a sense of urgency for change, breaking status quo inertia, reducing employee resistance.
ChangingChanging (Moving)Promoting new processes and new practices, providing training and support during the period.
RefreezingRefreezingInstitutionalizing new practices, incorporating into performance evaluation, forming new habits.

Kotter's Eight-Step Change Model is a more concrete implementation version:

  1. Establish a sense of urgency.
  2. Form a change promotion coalition.
  3. Propose a vision and strategy for change.
  4. Communicate the vision for change.
  5. Empower employees to take action.
  6. Create short-term wins.
  7. Consolidate achievements and promote more change.
  8. Embed new practices into organizational culture.

Change Log

  • 2026-06-11 Initial document creation.